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Abstract 

An adjacency labeling scheme is a method that assigns labels to the vertices of a graph such that 
adjacency between vertices can be inferred directly from the assigned label, without using a centralized 
data structure. We devise adjacency labeling schemes for the family of power-law graphs. This family 
that has been used to model many types of networks, e.g. the Internet AS-level graph. Furthermore, 
we prove an almost matching lower bound for this family. We also provide an asymptotically near- 
optimal labeling scheme for sparse graphs. Finally, we validate the efficiency of our labeling scheme 
by an experimental evaluation using both synthetic data and real-world networks of up to hundreds of 
thousands of vertices. 



1 Introduction 


A fundamental problem in networks is how to disseminate the structural information of the underlying graph 
of a network to its vertices. The purpose of such dissemination is that the local topology of the network can 
be inferred using only local information stored in each vertex without using costly access to large, global data 
structures. One way of doing so is via labeling schemes: an algorithm that assigns a bit string-a label-to 
each vertex so that a query between any two vertices can be deduced solely from their respective labels. The 
main objective of labeling schemes is to minimize the maximum label size: the maximum number of bits 
used in a label of any vertex. Labeling schemes for adjacency and other properties have found practical use 
in XML search engines [26j, mapping services [1] and routing [40) . 

In this paper we are interested in particular with labeling schemes for adjacency queries. For general 
graphs Moon [?4] showed lower and upper bounds of respectively n/2 and n/2 + log n bits on the label size. 
The asymptotic gap between these bounds was only recently closed by Alstrup et al. [S] who proved an 
upper bound of n/2 + 6 bits. Upper bounds for adjacency labeling schemes exist for many specific classes of 
graphs, including trees [10], planar graphs |29|, bounded-degree graphs |3|, and bipartite graphs [42] . 

However, for classes of graphs whose statistical properties-in particular their degree distribution-moie 
closely resemble that of real-world networks, there has, to our knowledge, been no research on adjacency 
labeling schemes. One class of graphs extensively used for modelling real-world networks is power-law graphs: 
roughly, n-vertex graphs where the number of vertices of degree k is proportional to n/k°^ for some positive a. 
Power-law graphs (also called scale-free graphs in the literature) have been used, e.g., to model the Internet 
AS-level graph nail, and many other types of network (see, e.g., [43l|25| for overviews). The adequacy 
of fit of power-law graph models to actual data, as well as the empirical correctness of the conjectured 
mechanisms giving rise to power-law behaviour, have been subject to criticism (see, e.g., jHHS])- spite 
of such criticism, and because their degree distribution affords a reasonable approximation of the degree 
distribution of many networks, the class of power-law graphs remains a popular tool in network modelling 
whose statistical behaviour is well-understood: e.g., for power-law graphs with 2 < a < 3, the range most 
often seen in the modeling of real-world networks [25] . it is known that with high probability the average 
distance between any two vertices is O(loglogn), the diameter is O(logn) and there exists a dense subgraph 
of W log log” vertices jUj. 

Routing labeling schemes for power-law graphs have been investigated by Brady and Cowen CHI, and by 
Chen et al. [21]. Labeling schemes for other properties than adjacency have been investigated for various 
classes of graphs, e.g., distance [30], and flow [35]. Dynamic labeling schemes were studied by Korman and 
Peleg EZIIMIISS] and recently by Dahlgaard et. al [57]. Experimental evaluation for some labeling schemes 
for various properties on general graphs have been performed by Caminiti et. al [20j . Fischer |28j and Rotbart 
et. al [H]. 

Adjacency labeling schemes are tightly coupled with the graph-theory related concept of induced universal 
graphs. Given a graph family T, the aim is to find smallest N such that a graph of N vertices contains 
all graphs in as induced subgraphs. Kannan, Naor and Rudich [33] showed that an /(n)logn adjacency 
labeling scheme for IF constructs an induced universal graph for this family of 2^^”^ vertices. Some of the 
adjacency labeling schemes reported earlier contributed a better bound than was known of induced universal 
graphs (see e.g [Mdn]). In the context of sparse graphs, a body of work on universal graph^for this family 
was investigated both by Babai et al. [14] and by Alon and Asodi [7]. 

1.1 Our contribution 

Our contributions are: 

An 0( ^/n(log n)^“^/“) adjacency labeling scheme for power-law graphs G. The scheme is based on 
two ideas: (I) a labeling strategy that partitions the vertices of G into high (“fat”) and low degree (“thin”) 
vertices based on a threshold degree, and (II) a threshold prediction that depends only on the coefficient a 

graph that contains each graph from the graph family as subgraph, not necessarily induced. 
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of a power-law curve fitted to the degree distribution of G. Real-world power-law graphs rarely exceed 10^° 
vertices, implying a label size of at most 10® bits, well within the processing capabilities of current hardware. 
We claim that our scheme is thus appealing in practice due both to its simplicity and hte small size of its 
labels. Using the same ideas, we get an asymptotically near-tight 0{y/n logn) adjacency labeling scheme for 
sparse graphs. 

A lower bound of U( ^/n) bits on the maximum label size for any adjacency labeling scheme 
for power-law graphs. To this end we define a restrictive subclass of power-law graphs and show that 
it is contained in the bigger class we study for the upper bound; we show that this class requires label size 
n{ y/n) for n-vertex graphs. This lower bound shows that our upper bound above is asymptotically optimal, 
bar a (logn)^“^^“ factor. By the connections between adjacency labeling schemes and universal graphs, we 
also obtain upper and lower bounds for induced universal graphs for power-law graphs. 

An experimental investigation of our labeling scheme Using both real-world (23K-325K vertices) 
and synthetic (300K-1M vertices) data sets, we observe that: (i) Our threshold prediction performs close to 
optimal when using the labeling strategy above, (ii) our labeling scheme achieves maximum label size several 
orders of magnitude smaller than the state-of-the-art labeling schemes for more general graph families. 

In addition, our study may contribute to the understanding of the quality of generative models — 
procedures that “grow” random graphs whose degree distributions are with high probability “close” to 
power-law graphs, such as the Barabasi-Albert model (TS] and the Aiello-Chung-Lu model [1]. As a first 
step, we provide an evidence that the randomized Barabasi-Albert model m produces only a small fraction 
of the power-law graphs possible. 


2 Graph Families Related to Power-Law Graphs 

In this section we define two families of graphs Va and with C Family Va is rich enough to 
contain the graphs whose degree distribution is approximately, or perfectly, power-law distributed, and our 
upper bound on the label size for our labeling scheme holds for any graph in Va- Family V'a is used to show 
our lower bound. In the following, let ii = 0( y/n) be the smallest integer such that \Cn/if\ < I, and let 
C' > + 5)“ + be a constant; we shall use C' in the remainder of the paper. 

Definition 1. Let a > 1 be a real number. Va is the family of graphs G such that if n = |U(C?)| then for 
all integers k between ^n/ log n and n — 1, 1^1 — )• 

The class of a-proper power law graphs contains graphs where the number of vertices of degree k must 
be rounded either up or down and the number of vertices of degree k is non-increasing with k. Note 
that the function k ^ G-^ is strictly decreasing. 

Definition 2. Let a > 1 be a real number. We say that an n-vertex graph G = {V,E) is an a-proper 
power-law graph if 

1. [CnJ — ii — \ < |Ui I < [Un], 

[Gi^\ < IU 2 I < \G§,']+1, 

3. for every i with 3 < i < n: \Vi\ G {[Gjkl, \Gjk~\}, and 
4-. for every i with 2 < i < n — 1: \Vi\ > |Ui+i|. 

The family of a-proper power-law graphs is denoted Va¬ 
lidate that we allow slightly more noise in the sizes of Vi and V 2 than in the remaining sets; without it, 
it seems tricky to prove a better lower bound than 0( “*y/n) on label sizes. 

We show the following properties of V'a - 
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Proposition 1. The maximum degree in an n-vertex graph inV'^ is at most + 2j ^/n+Ji+3 = 0( y/n). 

Proof. Let n > 0 be an integer and let k' = [y/n\. Furthermore, let Sk' = J2i=i 1^1) that is Sk' is the 
number of vertices of degree at most k'. Let Sfi, = (X]?=i lCni~°‘\ ) — ii — l. Then Sk' > Sfi,. We now bound 
from below. For every i with 1 < i < k', 

k' k' k' 

S-+k' = -ii-l + ^{ [Cni-°‘\ + 1) > -ii - 1 + ^ Cm-“ = -^-1 + Cn'^i-‘^ 

i—1 i—1 i—1 



giving Sk> > Sj., > n — y/n — [ ^/ri] — ii — 1. There are thus at most y/n + [ y/n\ + iy + I vertices 
of degree strictly more than k' = \ y/n]. Since for every 1 < z < n — 1: \Vi\ > |Pi+i|, it follows that the 
maximum degree of any a-proper power-law graph is at most + 2^ -|- zi -|- 3. □ 

Proposition 2. For a > 2, all graphs in V'^ are sparse. 

Proof. By Proposition the maximum degree of an n-vertex a-proper power-law graph is at most k' = 
-I- 2^ -I- zi -I- 3, whence the total number of edges is at most | J2k=i M^k\- By definition, |14| < 

+ 1 for fc 2 and IPzl < + 1, and thus 


1 

2 








< 1 + 


k'(k' + 1 ) 


= 0(n^/“) -b C'nC(a - 1) = 0{n). 


C'n^/c-“+i 

k=l 


□ 


Proposition 3. C 

Proof. Let d = + 2) -y/n -I- zi -b 3J. For any a-proper power-law graph with n vertices and for any k, 

\Vk\ < Ck~°‘n -b 1 and by Proposition [l] \Vk\ = 0 when k > d. 

Let k be an arbitrary integer between ^n/ logn and n— 1. We need to show that 1^1 — ^'(k^)- 
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It suffices to show this for k < d. We have: 


n—1 


^ +l)=d-k + l + CnJ2i' 


i—k 


i—k 


i—k 


< 


< 


< 


< 


C ii 


+ + 5 ) ^/n + Cn / X °‘dx 

Jk 


a — \ \/n 

C ii 


H-^ + 5 I + Cn 

a — 1 Cn 


1 


— a+1 




a — 1 ^/n ) \ n 


a — 1 
+ 


a — 1 


C H 


a — 1 


C h 


a — 1 Cn 


oc— 1 


+ 


c 


a — 1 


nfc-“+i< C"nfc-“+\ 


as desired. 


□ 


3 The Labeling Schemes 

We now construct algorithms for labeling schemes for c-sparse graphs and for the family Va- Both labeling 
schemes partition vertices into thin vertices which are of low degree and fat vertices of high degree. The 
degree threshold for the scheme is the lowest possible degree of a fat vertex. We start with c-sparse graphs. 

Theorem 1. There is a •\/2cn logn -I- 2 logn -I- 1 labeling scheme for Sc,n- 

Proof. Let G = (V, E) be an n-vertex c-sparse graph. Let /(n) be the degree threshold for n-vertex graphs; 
we choose f{n) below. Let k denote the number of fat vertices of G, and assign each to each fat vertex a 
unique identifier between 1 and k. Each thin vertex is given a unique identifier between fc -|-1 and n. 

For a V G V, the first part of the label C{v) is a single bit indicating whether v is thin or fat followed by 
a string of logn bits representing its identifier. If v is thin, the last part of C{v) is the concatenation of the 
identifiers of the neighbors of v. If v is fat, the last part of C{v) is a fat bit string of length k where the ith 
bit is 1 iff u is incident to the (fat) vertex with identifier i. 

Decoding a pair (£(u),£(w)) is now straightforward: if one of the vertices, say n, is thin, u and v are 
adjacent iff the identifier of v is part of the label of u. If both u and v are fat then they are adjacent iff the 
zth bit of the fat bit string of C{u) is I where i is the identifier of v. 

Since |E| < cn, we have fc < 2cn//(n). A fat vertex thus has label size 1-1-logn-l-A: < l + \ogn + 2cn/f{n) 
and a thin vertex has label size at most 1 -I- log n -I- /(n) log n. To minimize the maximum possible label size, 
we solve 2cnjx = a; logn. Solving this gives x = \j2cnl logn and setting /(n) = [a;] gives a label size of at 
most 1 -I- log n -I- (•\/2cn/ logn -|- 1) logn < I -|- 2 log n + ■s/2cn logn. □ 

By Proposition graphs in V'^ are sparse for a > 2. This gives a label size of 0{y/nlogn) with the 
labeling scheme in Theorem We now show that this label can be significantly improved, by constructing 
a labeling scheme for Va which contains V'a- 

Theorem 2. There is a f/ C"n(log-I- 2 logn -I- I labeling scheme for Va- 

Proof. The proof is very similar to that of Theorem]^ We let /(n) denote the degree threshold. If we pick 
/(n) > C 'log ^ then by Definition there are at most C'njf{n)°‘~^ fat vertices. Defining labels in the 
same way as in Theoremgives a label size for thin vertices of at most 1 -|- logn -I- /(n)logn and a label 
size for fat vertices of at most 1 -I- logn -I- C"n//(n)““^. We minimize by solving xlogn = C"n/x““^, giving 
X = C'C'nj logn. Setting /(n) = [x] gives a label size of at most ^C'n(logn)^“^/“ -I- 2logn -|- 1. □ 
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4 Lower Bounds 


We now derive lower bounds for the label size of any labeling schemes for both Sc,n and 7^^. Our proofs 
rely on Moon’s [44] lower bound of [u/2j bits for labeling scheme for general graphs. We first show that the 
upper bound achieved for sparse graphs is close to the best possible. The following proposition is essentially 
a more precise version of the lower bound suggested by Spinrad |48j . 


Proposition 4. Any labeling scheme for Sc,n requires labels of size at least bits. 

Proof. Assume for contradiction that there exists a labeling scheme assigning labels of size strictly less than 
Let G be an n-vertex graph. Let G' be the graph resulting by adding — u isolated vertices to 
G, and note that now G" is c-sparse. The graph G is an induced subgraph of G'. It now follows that the 

y/c[n^/c\ 

2 


vertices of G have labels of size strictly less than 
contradiction. 


< n 12 bits. As G was arbitrary, we obtain a 

□ 


4.1 Lower bound for power-law graphs 

In the remainder of this section we are assuming that a > 2 and prove the following: 

Theorem 3. For all n, any labeling scheme for n-vertex graphs ofVa requires label size n( ^/n). 

More precisely, we present a lower bound for which is contained in Va. Let n S N be given and let 
H = (y{H), E{F[)) be an arbitrary graph with ii vertices where ii = is defined as in Section]^ 

We show how to construct an a-proper power-law graph G = (V, E) with n vertices that contains H as an 
induced subgraph. Observe that a labeling of G induces a labeling of H. As H was chosen arbitrarily and 
as any labeling scheme for fc-vertex graphs requires [fi/2j label size in the worst case, Theorem follows if 
we can show the existence of G. 

We construct G incrementally where initially E = %. Partition V into subsets Vi,..., 14, as follows. The 
set Vi has size [CnJ — 4. For i = 2,... ,4 — 1, Vi has size \Gn/i°'\. Letting n' = \Vi\, we set the size 

of Vi to 1 for j = ii,... ,ii + n — n' — 1 and the size of Vi to 0 for i = ii + n — n',... ,n, thereby ensuring 
that the sum of sizes of all sets is n. Observe that 'Yl\fi\_Gn/i°'\ < n so that n' < n — 4, implying that 
n — n' > i\. Hence we have at least 4 size 1 subsets ..., Vij^+n-n'-i in each of which the vertex degree 
allowed by Definition is at least 4- 

Let vi,...,Vi^ be an ordering of V{H), form a set Vh C 1/ of 4 arbitrary vertices from the sets 
1^1,..., and choose an ordering v{,...,v'i_^ of Vh. For all i,j G {!,..., 4}, add edge (ui,w') 

to E iff (vi, Vj) G E{H). Now, H is an induced subgraph of G and since the maximum degree of H is ii — 1, 
no vertex of Vi exceeds the degree bound allowed by Definition for i = 1,... ,n. 

We next add additional edges to G in three phases to ensure that it is an a-proper power law graph while 
maintaining the property that El is an induced subgraph of G. For z = 1,... ,n, during the construction of 
G we say that a vertex u G is unprocessed if its degree in the current graph G is strictly less than i. If 
the degree of v is exactly i, v is processed. 

Phase 1: Let 4' = H \ (14 U Vh). Phase 1 is as follows: while there exists a pair of unprocessed vertices 
{u,v) gV' X Vh, add {u,v) to E. 

When Phase 1 terminates, H is clearly still an induced subgraph of G. Furthermore, all vertices of Vh 
are processed. To see this, note that the sum of degrees of vertices of Vh when they are all processed is 
0{i\) = which is o(n) since a > 2. Furthermore, prior to Phase 1, each of the 0(n) vertices of V 

have degree 0 and can thus have their degrees increased by at least 1 before being processed. 
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Phase 2: Phase 2 is as follows: while there exists a pair of unprocessed vertices {u, v) G V x V', add (it, v) 
to E. At termination, at most one vertex of V remains unprocessed. If such a vertex exists we process it 
by connecting it to 0( ^/n) vertices of Vi; as \Vi\ = 0(n) there are enough vertices of Vi to accomodate this. 
Furthermore, prior to adding these edges, all vertices of Vi have degree 0, and hence the bound allowed for 
vertices of this set is not exceeded. 

Phase 3: In Phase 3, we add edges between pairs of unprocessed vertices of Vi until no such pair exists. 
If no unprocessed vertices remain we have the desired a-proper power law graph G. Otherwise, let w G Vi 
be the unprocessed vertex of degree 0. We add a single edge from w to another vertex w' of Pi, thereby 
processing w and moving ic' from Vi to V 2 . Note that the sizes of Vi and V 2 are kept in their allowed ranges 
due to the first two conditions in Definition This proves Theorem 


5 Scale Free Graphs from Generative Models 

The Barabasi-Albert (BA) model is a well-known generative model for power-law graphs that, roughly, grows 
a graph in a sequence of time steps by inserting a single vertex at each step and attaching it to m existing 
vertices with probability weighted by the degree of each existing vertex |15j . The BA model generates graphs 
that asymptotically have a power-law degree distribution {a = 3) for low-degree nodes [T7]. Graphs created 
by the BA model have low arboricity (the arboricity of a graph is the minimum number of spanning forests 
needed to cover its edges.) [31]; we use that fact to prove the following highly efficient labeling scheme. 

Proposition 5. The family of graphs generated by the BA model has an 0(m log n) adjacency labeling 
scheme. 

Proof Let G = (V, E) be an n-vertex graph resulting by the construction by the BA model with some 
parameter m (starting from some graph Gq = {Vq,Eq) with |Po| ^ n). While it is not known how to 
compute the arboricity of a graph efficiently, it is possible in near-linear time to compute a partition of G 
with at most twicej^ the number of forests in comparison to the optimal El. We can thus decompose the 
graph to 2m forests in near linear time and label each forest using Alstrup and Rauhe’s [10] logn-|-0(log* n) 
labeling scheme for trees, and achieve a 2m(logn -|- 0(log* n)) labeling scheme for G. □ 

Note that if the encoder operates at the same time as the creation of the graph. Proposition [^ can be 
strengthened to yield an an mlogn labeling scheme: simply store the identifiers of the m vertices attached 
with every vertex insertion. Theorem]^ and Proposition [^strongly suggest that, for each sufficiently large n, 
the number of power-law graphs with n vertices is vastly larger than the number of graphs with n vertices 
created by the BA model. In contrast, other generative models such as Waxman [i^, N-level Hierarchical m- 
and Chung’s [23] (Chapter 3) do not seem to have an obvious smaller label size than the one in Proposition]^ 

6 Experimental Study 

We now perform an experimental evaluation of our labeling scheme on a number of large networks. The 
source code for our experiments can be found at: www.diku.dk/~simonsen/suppmat/podcl5/powerlaw.zip 

6.1 Experimental Framework 

Performance Indicators. Recall that our labeling scheme consists of two ideas: separation of the nodes 
according to some threshold, and selecting a threshold depending on the power-law parameter a. In our label¬ 
ing scheme, the threshold is [ y'Gnf {a — 1)]. We call this the predicted threshold; it is an approximation to 
the theoretically optimal threshold choice when degree distributions follow the power-law curve k 1 —>■ Gn/k°^ 

^More precisely, for any e S (0,1) there exist an 0{\E[G)\/e) algorithm 1391 that computes such partition using at most 
(1 -t e) times more forests than the optimal. 
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perfectly. The approximation uses integration similar to what is done in, e.g., the proof of Proposition!^ For 
a concrete graph G, it is conceivable that some other threshold uq, different from the predicted threshold, 
would yield a labeling scheme with smaller size. Let maxt(no) and max/(no) be the maximum label sizes of 
thin, resp. fat vertices in G when the threshold is set at 1 < uq < n — 1. Clearly the maximum label size 
with the threshold no is max{maxt(no), maxy(np)}. Observe further that maxt(no) and max^(no) are mono- 
tonically increasing, resp. decreasing functions of uq. Hence, the ng for which max{maxt(no), max/(no)} is 
minimal is where the curves of maxt(no) and max/(no) intersect. We call this no the empirical threshold. 
We set up the following performance indicators to gauge (1) the difference in label size with predicted and 
empirical threshold, and (2) the label size obtained by our labeling scheme on several data sets, compared 
to other labeling schemes. 

Performance Indicator 1: We measure the label sizes for the labeling schemes with the predicted and 
empirical thresholds. We interpret a small relative difference between these label sizes means that the 
predicted threshold can achieve small label sizes without examining the global properties of the network 
other than the power-law parameter a. 

Performance Indicator 2: We measure the label sizes attained by our labeling schemes to other labeling 
schemes, namely state-of-the art labeling schemes for the classes of bounded-degree, sparse and general 
graphs using the labeling schemes suggested in [3] , Theorem and [S] . We interpret small label sizes for 
our scheme, especially in comparison with “small” classes like the class of bounded-degree graphs, as a sign 
that our labeling scheme efficiently utilizes the extra information about the graphs: namely that their degree 
distribution is reasonably well-approximated by a power-law. 


Test Sets. We employ both real-world and synthetic data sets. 

The six synthetic data sets are created by first generating a power-law degree sequence using the method 
of Clauset et al. [ISl App. D], subsequently constructing a corresponding graph for the sequence using the 
Havel-Hakimi method [33]. We use the range 2 < a < 3 as suggested in |25| as this range of a occurs most 
commonly in modeling of real-world networks. We generate graphs of 300, 000 and IM. vertices denoted 
s300“^^ and slM“^“ respectively, for x € {2.2, 2.4, 2.6, 2.8}. 

The three real-world data sets originate from articles that found the data to be well-approximated by 
a power-law. The WWW data set [Bj contains information on links between webpages within the nd.edu 
domain. The ENRON data set m contains email communication between Enron employees (vertices are 
email addresses; there is a link between two addresses if a mail has been sent between them). The internet 
data set [15] provides a snapshot the Internet structure at the level of autonomous systems, reconstructed 
from BGP tables. For all of these sets, we consider the underlying simple, undirected graphs. For each set, 
standard maximum likelihood methods were used to compute the parameter a of the best-fitting power-law 
curve [2^. Additional information on the data sets can be found in Table 


Real-Life 

Data set 

Tl 

\E\ 

a 

^max 

Source 

WWW 

325,729 

1,117,563 

2.16 

10,721 

il 

ENRON 

36,692 

183,830 

1.97 

1,383 

im 

INTERNET 

22,963 

48,436 

2.09 

2,390 

m 

Synthetic 


1,000,000 

1,127,797 

2.4 

42,683 

- 


1,000,000 

878,472 

2.6 

12,169 

- 


1,000,000 

751,784 

2.8 

1,692 

- 

g300°^=2.2 

300,000 

491,926 

2.2 

10,906 

- 

g300°^=2.4 

300,000 

327,631 

2.4 

3,265 

- 


300,000 

261,949 

2.6 

1,410 

- 

s300“^^® 

300,000 

227,247 

2.8 

1,842 

- 


Table 1: Data sets and their properties. All graphs are undirected and simple. Amax is the maximum degree 
of any vertex in the data set. 
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6.2 Findings 

Figure [l] shows the distribution of maximum label sizes for one synthetic and one real-world data set. The 
maximum label size for the predicted and empirical thresholds as well as upper bounds on the label sizes 
from different label schemes in the literature can be seen in Table for two synthetic data sets and all three 
real-world data sets. Plots for the remaining data sets can be found in Appendix \K\ 
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Figure 1: Maximum label sizes of different threshold values for the syn300“^^'^ and ENRON data sets. The 
triangles and crosses represent that for the tested threshold the largest label belong to fat, resp. thin node. 
The star indicate the position of the predicted threshold. 


Table shows the maximum label sizes achieved using different labeling schemes on our data sets. 
“Predicted” shows the experimental maximum label size obtained by running our scheme on the graphs, 
“Empirical” is the label size attained by using the empirical threshold. The remaining columns show non- 
experimental upper bounds for different label schemes: “Bound” is the upper bound guaranteed in Theo¬ 
rem [2 “(7-sparse” is the labeling scheme for sparse graphs defined in Theorem 0 “BD” is the [^Iflogn] 
bouimed degree graph labeling of [3], and AKTZ is the \n/2\ -|- 6 general graph labeling of [1]. Both 
“Empirical” and “Bound” using simple concatenation of labels to represent the fat bit string 

Our findings are as follows. For Performance Indicator (i), our labeling scheme obtains maximum label 
size at most 3% larger than what would have been obtained by using the empirical threshold for all synthetic 
data sets. This is expected—the synthetic data sets are graphs generated specifically to have power-law 
distributed degree distribution. For the real-world data sets, the labeling scheme obtains maximum label 
size at most 23% larger than by using the empirical threshold; this larger deviation is likely due to degree 
distributions of the data sets being close to, but not quite, power-law distributions due to natural phenomena 
or noise. E.g., for the ENRON data set there is sudden drop in frequency between nodes of degree < 158 and 
> 158. 

For Performance Indicator (ii), both our experimental results and theoretical upper bounds for our 
labeling scheme are several orders of magnitudes lower than for labeling schemes aimed at more general 
classes of graphs, as expected. Of the more general classes of graphs, it is most interesting to compare the 
upper bound of bounded degree graphs—the most restrictive class of graphs that both contains the class of 
power-law graphs and has an efficient labeling scheme described in the literature [3]. As seen in Table 

®Our labeling schemes introduced in this paper all make use of a succinctly represented “fat bit string”; for our experiments, 
we use simple concatenation of labels instead of a bit string; this incurs a (logn)/Q: factor on the label size, but simplifies the 
implementation. 
























Data set 

Predicted 

Empirical 

Bound 

C-sparse 

BD [5] 

AKTZ dl 


4,841 

4,821 

25,012 

30, 079 

426, 820 

500, 006 

slM“=''-'= 

3,361 

3,201 

15,282 

26,551 

121,680 

500, 006 


2,101 

2,061 

10,081 

24,566 

16,920 

500, 006 


4,523 

4,447 

24, 878 

18,885 

103,607 

150,006 


2,775 

2,680 

14, 404 

15,420 

31,008 

150,006 

8300“^^*^ 

1,958 

1,920 

9,151 

13, 792 

13,395 

150,006 

s300“^^“ 

1,350 

1,312 

6,244 

12,849 

17,499 

150,006 

WWW 

5,245 

3,060 

29,225 

28, 445 

101,840 

162,870 

ENRON 

2,609 

2,577 

15,835 

9,735 

11,056 

18,352 

INTERNET 

1,426 

1,156 

8,181 

4,700 

17,925 

11,487 


Table 2: Label size in bits of labeling schemes. The two leftmost columns are experimental results; the 
remaining are upper bounds on label sizes computed from the characteristics of the data sets. 


the upper bound on our labeling schemes for both power-law graphs and sparse graphs have better upper 
bounds on label sizes, but only marginally so for data sets with low maximum degree and low values of the 
power-law parameter a, e.g. Enron (a = 1.97). It is interesting to note that the actual label sizes obtained 
in the experiments (the two leftmost columns of Table are substantially lower than the upper bounds, 
that is, the labeling scheme performs much better in practice than suggested by theory (down to less than a 
kilobyte per vertex for all data sets). This phenomenon may be due to the degree distribution of the graphs 
of the data sets having only minor deviation from a power-law for small vertex degrees; our upper bounds on 
the label size are derived by using the very rich family Va that allows very large deviation from a power-law 
for degrees between 1 and ^n/ logn — 1. 

Finally, note that our labeling scheme supports adjacency for directed graphs by using one more bit per 
edge in each label to store the edge orientation. For data sets whose natural interpretation is as a directed 
graph (e.g., the WWW set where edges are outgoing and incoming links), the results of Table thus carry 
over with just one more bit added to the numbers in the two leftmost columns. 


7 Conclusion and Future Work 

We have devised adjacency labeling schemes for sparse graphs and graphs whose degree distribution ap¬ 
proximately follows a power-law distribution. We have proven lower bounds for the class of power-law 
graphs showing that our labeling scheme is almost asymptotically optimal. Furthermore, we have shown 
experimentally that the labeling scheme for power-law graphs obtain results in practice requiring very little 
space (labels smaller than a kilobyte per vertex for real-world graphs with several hundreds of thousands of 
vertices). 


7.1 Future work 


It would be of interest to test the performance of the labeling scheme on more real-world data sets, and in 
particular investigating dynamic labeling schemes on such sets: if vertices can enter and exit the network, 
labels need to be recomputed efficiently. As our labeling scheme can be extended to handle directed graphs by 
using a single bit more per label, it would be interesting to investigate the overhead incurred by distributing 
the storage of the graph topology to the labels (as per our labeling scheme) compared to the substantial body 
of work on storing directed power-law graphs directly in main memory (so-called “web-graph compression”) 
[31 mm [21]. The label sizes attained in Sec. |6.I| can be reduced by using the succinctly represented “fat 
bit string” as well as an additional rule that prevents storing an edge in two labels; doing so would yield 
a small multiplicative reduction in label size, making our labeling scheme even more practical. Labeling 
schemes for other properties than adjacency may be investigated for power-law graphs, e.g. for distance as 
has been done for other classes of graphs [8] and briefly considered for power-law graphs in the context of 
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routing algorithms [H]. Finally, labeling schemes for power law graphs can likely be devised for the realistic 
case where the scheme only has incomplete knowledge of the graph, for example when the expected frequency 
of vertices of each degree is known, but not the exact frequency of each vertex. 
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A Experimental results in detail 

Subsections |A.l| and |A.2| show the maximum label sizes for all synthetic and real-world data sets, respectively. 

A.l Maximum label size distribution for synthetic datasets 
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Figure 2: Distribution of maximum label sizes for four different synthetic datasets of \V\ = 300,000. Each 
dataset was generated using one of a-values: 2.2, 2.4, 2.6, 2.8. Fat vertices are shown as red triangles 
and thin vertices as blue crosses. The black pentagram shows the label size obtained by using the predicted 
threshold. The transition between fat and thin vertices is the maximum label size obtained by using the 
empirical threshold. 
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Figure 3: Distribution of maximum label sizes for three different synthetic datasets of y| = 1,000,000. 
Each dataset was generated using one of a-values: 2.4, 2.6, 2.8. Fat vertices are shown as red triangles 
and thin vertices as blue crosses. The black pentagram shows the label size obtained by using the predicted 
threshold. The transition between fat and thin vertices is the maximum label size obtained by using the 
empirical threshold. 
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A.2 Maximum label size distribution for real-life datasets 

For completeness, we provide an illustration of the best-fitting power law fitted to the probability mass 
function of the data. 


xlO"" 




(a) Fat and thin vertices vs. threshold values 


(b) Power law fit 


Figure 4: Left: Fat and thin vertices plotted against increasing threshold values for the WWW dataset. The 
black pentagram shows the predicted threshold (1 /C(q:) ^n)) rounded to nearest integer. Right: Best-fitting 
power law (a = 2.16) superimposed on the complementary cumulative distribution function (CCDF) using 
the framework by [25] . 



(a) Fat and thin vertices vs. threshold values 


(b) Power law fit 


Figure 5: Left: Fat and thin vertices plotted against increasing threshold values for the ENRON email 
communication dataset. The black pentagram is the predicted threshold (l/^(a) y/(n)) rounded to the 
nearest integer. Right: Right: Best-fitting power law (a = 1.97) superimposed on the complementary 
cumulative distribution function (CCDF) using the framework by [25] . 
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(a) Fat and thin vertices vs. threshold values 


(b) Power law fit 


Figure 6: Left: Fat and thin vertices plotted against increasing threshold values for the internet dataset. 
The black pentagram is the predicted threshold (1/C(a) y^n)) rounded to nearest integer. Right: Right: 
Best-fitting power law (a = 2.09) superimposed on the complementary cumulative distribution function 
(CCDF) using the framework by pS] . 
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